{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NpJd3dlOCStH"
      },
      "source": [
        "\u003ca href=\"https://colab.research.google.com/github/magenta/ddsp/blob/main/ddsp/colab/tutorials/1_synths_and_effects.ipynb\" target=\"_parent\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\u003c/a\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hMqWDc_m6rUC"
      },
      "source": [
        "\n",
        "##### Copyright 2021 Google LLC.\n",
        "\n",
        "Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VNhgka4UKNjf"
      },
      "outputs": [],
      "source": [
        "# Copyright 2021 Google LLC. All Rights Reserved.\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     http://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License.\n",
        "# =============================================================================="
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZFIqwYGbZ-df"
      },
      "source": [
        "# DDSP Synths and Effects\n",
        "\n",
        "This notebook demonstrates the use of several of the Synths and Effects Processors in the DDSP library. While the core functions are also directly accessible through `ddsp.core`, using Processors is the preferred API for end-2-end training. \n",
        "\n",
        "As demonstrated in the [0_processors.ipynb](colab/tutorials/0_processors.ipynb) tutorial, Processors contain the necessary nonlinearities and preprocessing in their `get_controls()` method to convert generic neural network outputs into valid processor controls, which are then converted to signal by `get_signal()`. The two methods are called in series by `__call__()`.\n",
        "\n",
        "While each processor is capable of a wide range of expression, we focus on simple examples here for clarity."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "jKDRJa6jztLT"
      },
      "outputs": [],
      "source": [
        "#@title Install and import dependencies\n",
        "\n",
        "%tensorflow_version 2.x\n",
        "!pip install -qU ddsp\n",
        "\n",
        "# Ignore a bunch of deprecation warnings\n",
        "import warnings\n",
        "warnings.filterwarnings(\"ignore\")\n",
        "\n",
        "import ddsp\n",
        "import ddsp.training\n",
        "from ddsp.colab.colab_utils import (play, record, specplot, upload, \n",
        "                                    DEFAULT_SAMPLE_RATE)\n",
        "import matplotlib.pyplot as plt\n",
        "import numpy as np\n",
        "import tensorflow as tf\n",
        "import tensorflow_datasets as tfds\n",
        "\n",
        "sample_rate = DEFAULT_SAMPLE_RATE  # 16000"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6jXC-hm09dyl"
      },
      "source": [
        "# Synths\n",
        "\n",
        "Synthesizers, located in `ddsp.synths`, take network outputs and produce a signal (usually used as audio). "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "256dCv-T9xHi"
      },
      "source": [
        "## Harmonic"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5HxKR0UTGpyn"
      },
      "source": [
        "The harmonic synthesizer models a sound as a linear combination of harmonic sinusoids. Amplitude envelopes are generated with 50% overlapping hann windows. The final audio is cropped to `n_samples`.\n",
        "\n",
        "Inputs:\n",
        "* `amplitudes`: Amplitude envelope of the synthesizer output.\n",
        "* `harmonic_distribution`: Normalized amplitudes of each harmonic.\n",
        "* `frequencies`: Frequency in Hz of base oscillator."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "MQ-ZK2gI-df6"
      },
      "outputs": [],
      "source": [
        "n_frames = 1000\n",
        "hop_size = 64\n",
        "n_samples = n_frames * hop_size\n",
        "\n",
        "# Amplitude [batch, n_frames, 1].\n",
        "# Make amplitude linearly decay over time.\n",
        "amps = np.linspace(1.0, -3.0, n_frames)\n",
        "amps = amps[np.newaxis, :, np.newaxis]\n",
        "\n",
        "# Harmonic Distribution [batch, n_frames, n_harmonics].\n",
        "# Make harmonics decrease linearly with frequency.\n",
        "n_harmonics = 20\n",
        "harmonic_distribution = np.ones([n_frames, 1]) * np.linspace(1.0, -1.0, n_harmonics)[np.newaxis, :]\n",
        "harmonic_distribution = harmonic_distribution[np.newaxis, :, :]\n",
        "\n",
        "# Fundamental frequency in Hz [batch, n_frames, 1].\n",
        "f0_hz = 440.0 * np.ones([1, n_frames, 1])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "1ZPz0Ej8-xKN"
      },
      "outputs": [],
      "source": [
        "# Create synthesizer object.\n",
        "harmonic_synth = ddsp.synths.Harmonic(n_samples=n_samples,\n",
        "                                      scale_fn=ddsp.core.exp_sigmoid,\n",
        "                                      sample_rate=sample_rate)\n",
        "\n",
        "# Generate some audio.\n",
        "audio = harmonic_synth(amps, harmonic_distribution, f0_hz)\n",
        "\n",
        "# Listen.\n",
        "play(audio)\n",
        "specplot(audio)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4HmOvGzs9zHB"
      },
      "source": [
        "## Filtered Noise\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6o8otCCoCRwe"
      },
      "source": [
        "\n",
        "The filtered noise synthesizer is a subtractive synthesizer that shapes white noise with a series of time-varying filter banks. \n",
        "\n",
        "Inputs:\n",
        "* `magnitudes`: Amplitude envelope of each filter bank (linearly spaced from 0Hz to the Nyquist frequency)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "eQSsBpCw_0f6"
      },
      "outputs": [],
      "source": [
        "n_frames = 250\n",
        "n_frequencies = 1000\n",
        "n_samples = 64000\n",
        "\n",
        "# Bandpass filters, [n_batch, n_frames, n_frequencies].\n",
        "magnitudes = [tf.sin(tf.linspace(0.0, w, n_frequencies)) for w in np.linspace(8.0, 80.0, n_frames)]\n",
        "magnitudes = 0.5 * tf.stack(magnitudes)**4.0\n",
        "magnitudes = magnitudes[tf.newaxis, :, :]"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Pz8vtRALBeCW"
      },
      "outputs": [],
      "source": [
        "# Create synthesizer object.\n",
        "filtered_noise_synth = ddsp.synths.FilteredNoise(n_samples=n_samples, \n",
        "                                                 scale_fn=None)\n",
        "\n",
        "# Generate some audio.\n",
        "audio = filtered_noise_synth(magnitudes)\n",
        "\n",
        "# Listen.\n",
        "play(audio)\n",
        "specplot(audio)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "d-K69x2790ad"
      },
      "source": [
        "## Wavetable"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xRWOXvZ5Gsek"
      },
      "source": [
        "The wavetable synthesizer generates audio through interpolative lookup from small chunks of waveforms (wavetables) provided by the network. In principle, it is very similar to the `Harmonic` synth, but with a parameterization in the waveform domain and generation using linear interpolation vs. cumulative summation of sinusoid phases.\n",
        "\n",
        "Inputs:\n",
        "* `amplitudes`: Amplitude envelope of the synthesizer output.\n",
        "* `wavetables`: A series of wavetables that are interpolated to cover n_samples.\n",
        "* `frequencies`: Frequency in Hz of base oscillator."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "zfYZ9P5yCjO8"
      },
      "outputs": [],
      "source": [
        "n_samples = 64000\n",
        "n_wavetable = 2048\n",
        "n_frames = 100\n",
        "\n",
        "# Amplitude [batch, n_frames, 1].\n",
        "amps = tf.linspace(0.5, 1e-3, n_frames)[tf.newaxis, :, tf.newaxis]\n",
        "\n",
        "# Fundamental frequency in Hz [batch, n_frames, 1].\n",
        "f0_hz = 110 * tf.linspace(1.5, 1, n_frames)[tf.newaxis, :, tf.newaxis]\n",
        "\n",
        "# Wavetables [batch, n_frames, n_wavetable].\n",
        "# Sin wave\n",
        "wavetable_sin = tf.sin(tf.linspace(0.0, 2.0 * np.pi, n_wavetable))\n",
        "wavetable_sin = wavetable_sin[tf.newaxis, tf.newaxis, :]\n",
        "\n",
        "# Square wave\n",
        "wavetable_square = tf.cast(wavetable_sin \u003e 0.0, tf.float32) * 2.0 - 1.0\n",
        "\n",
        "# Combine them and upsample to n_frames.\n",
        "wavetables = tf.concat([wavetable_square, wavetable_sin], axis=1)\n",
        "wavetables = ddsp.core.resample(wavetables, n_frames)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3jOs_RsfCrd9"
      },
      "outputs": [],
      "source": [
        "# Create synthesizer object.\n",
        "wavetable_synth = ddsp.synths.Wavetable(n_samples=n_samples,\n",
        "                                        sample_rate=sample_rate,\n",
        "                                        scale_fn=None)\n",
        "\n",
        "# Generate some audio.\n",
        "audio = wavetable_synth(amps, wavetables, f0_hz)\n",
        "\n",
        "# Listen, notice the aliasing artifacts from linear interpolation.\n",
        "play(audio)\n",
        "specplot(audio)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "C_lPZWTN92YJ"
      },
      "source": [
        "# Effects\n",
        "\n",
        "Effects, located in `ddsp.effects` are different in that they take network outputs to transform a given audio signal. Some effects, such as Reverb, optionally have trainable parameters of their own."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Lay_6Ldw93ZL"
      },
      "source": [
        "## Reverb\n",
        "\n",
        "There are several types of reverberation processors in ddsp.\n",
        "\n",
        "* Reverb\n",
        "* ExpDecayReverb\n",
        "* FilteredNoiseReverb\n",
        "\n",
        "Unlike other processors, reverbs also have the option to treat the impulse response as a 'trainable' variable, and not require it from network outputs. This is helpful for instance if the room environment is the same for the whole dataset. To make the reverb trainable, just pass the kwarg `trainable=True` to the constructor"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "tfNrv4MQXMW3"
      },
      "outputs": [],
      "source": [
        "#@markdown Record or Upload Audio\n",
        "\n",
        "record_or_upload = \"Record\" #@param [\"Record\", \"Upload (.mp3 or .wav)\"]\n",
        "\n",
        "record_seconds =   5#@param {type:\"number\", min:1, max:10, step:1}\n",
        "\n",
        "if record_or_upload == \"Record\":\n",
        "  audio = record(seconds=record_seconds)\n",
        "else:\n",
        "  # Load audio sample here (.mp3 or .wav3 file)\n",
        "  # Just use the first file.\n",
        "  filenames, audios = upload()\n",
        "  audio = audios[0]\n",
        "\n",
        "# Add batch dimension\n",
        "audio = audio[np.newaxis, :]\n",
        "\n",
        "# Listen.\n",
        "specplot(audio)\n",
        "play(audio)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "xK951CM1XTGT"
      },
      "outputs": [],
      "source": [
        "# Let's just do a simple exponential decay reverb.\n",
        "reverb = ddsp.effects.ExpDecayReverb(reverb_length=48000)\n",
        "\n",
        "gain = [[-2.0]]\n",
        "decay = [[2.0]]\n",
        "# gain: Linear gain of impulse response. Scaled by self._gain_scale_fn.\n",
        "# decay: Exponential decay coefficient. The final impulse response is\n",
        "#          exp(-(2 + exp(decay)) * time) where time goes from 0 to 1.0 over the\n",
        "#          reverb_length samples.\n",
        "\n",
        "audio_out = reverb(audio, gain, decay)\n",
        "\n",
        "# Listen.\n",
        "specplot(audio_out)\n",
        "play(audio_out)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Skfxkn59VS01"
      },
      "outputs": [],
      "source": [
        "# Just the filtered noise reverb can be quite expressive.\n",
        "reverb = ddsp.effects.FilteredNoiseReverb(reverb_length=48000,\n",
        "                                          scale_fn=None)\n",
        "\n",
        "# Rising gaussian filtered band pass.\n",
        "n_frames = 1000\n",
        "n_frequencies = 100\n",
        "\n",
        "frequencies = np.linspace(0, sample_rate / 2.0, n_frequencies)\n",
        "center_frequency = 4000.0 * np.linspace(0, 1.0, n_frames)\n",
        "width = 500.0\n",
        "gauss = lambda x, mu: 2.0 * np.pi * width**-2.0 * np.exp(- ((x - mu) / width)**2.0)\n",
        "\n",
        "# Actually make the magnitudes.\n",
        "magnitudes = np.array([gauss(frequencies, cf) for cf in center_frequency])\n",
        "magnitudes = magnitudes[np.newaxis, ...]\n",
        "magnitudes /= magnitudes.sum(axis=-1, keepdims=True) * 5\n",
        "\n",
        "# Apply the reverb.\n",
        "audio_out = reverb(audio, magnitudes)\n",
        "\n",
        "# Listen.\n",
        "specplot(audio_out)\n",
        "play(audio_out)\n",
        "plt.matshow(np.rot90(magnitudes[0]), aspect='auto')\n",
        "plt.title('Impulse Response Frequency Response')\n",
        "plt.xlabel('Time')\n",
        "plt.ylabel('Frequency')\n",
        "plt.xticks([])\n",
        "_ = plt.yticks([])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Lc_md-cD99y6"
      },
      "source": [
        "## FIR Filter\n",
        "\n",
        "Linear time-varying finite impulse response (LTV-FIR) filters are a broad class of filters that can vary over time."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "Gv5GQX5oeL-f"
      },
      "outputs": [],
      "source": [
        "#@markdown Record or Upload Audio\n",
        "\n",
        "record_or_upload = \"Record\" #@param [\"Record\", \"Upload (.mp3 or .wav)\"]\n",
        "\n",
        "record_seconds =   5#@param {type:\"number\", min:1, max:10, step:1}\n",
        "\n",
        "if record_or_upload == \"Record\":\n",
        "  audio = record(seconds=record_seconds)\n",
        "else:\n",
        "  # Load audio sample here (.mp3 or .wav3 file)\n",
        "  # Just use the first file.\n",
        "  filenames, audios = upload()\n",
        "  audio = audios[0]\n",
        "\n",
        "# Add batch dimension\n",
        "audio = audio[np.newaxis, :]\n",
        "\n",
        "# Listen.\n",
        "specplot(audio)\n",
        "play(audio)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "YAJYBSUSfADI"
      },
      "outputs": [],
      "source": [
        "# Let's make an oscillating gaussian bandpass filter.\n",
        "fir_filter = ddsp.effects.FIRFilter(scale_fn=None)\n",
        "\n",
        "# Make up some oscillating gaussians.\n",
        "n_seconds = audio.size / sample_rate\n",
        "frame_rate = 100  # Hz\n",
        "n_frames = int(n_seconds * frame_rate)\n",
        "n_samples = int(n_frames * sample_rate / frame_rate)\n",
        "audio_trimmed = audio[:, :n_samples]\n",
        "\n",
        "n_frequencies = 1000\n",
        "frequencies = np.linspace(0, sample_rate / 2.0, n_frequencies)\n",
        "\n",
        "lfo_rate = 0.5  # Hz\n",
        "n_cycles = n_seconds * lfo_rate\n",
        "center_frequency = 1000 + 500 * np.sin(np.linspace(0, 2.0*np.pi*n_cycles, n_frames))\n",
        "width = 500.0\n",
        "gauss = lambda x, mu: 2.0 * np.pi * width**-2.0 * np.exp(- ((x - mu) / width)**2.0)\n",
        "\n",
        "\n",
        "# Actually make the magnitudes.\n",
        "magnitudes = np.array([gauss(frequencies, cf) for cf in center_frequency])\n",
        "magnitudes = magnitudes[np.newaxis, ...]\n",
        "magnitudes /= magnitudes.max(axis=-1, keepdims=True)\n",
        "\n",
        "# Filter.\n",
        "audio_out = fir_filter(audio_trimmed, magnitudes)\n",
        "\n",
        "# Listen.\n",
        "play(audio_out)\n",
        "specplot(audio_out)\n",
        "_ = plt.matshow(np.rot90(magnitudes[0]), aspect='auto')\n",
        "plt.title('Frequency Response')\n",
        "plt.xlabel('Time')\n",
        "plt.ylabel('Frequency')\n",
        "plt.xticks([])\n",
        "_ = plt.yticks([])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mkjWUGpZ95Mr"
      },
      "source": [
        "## ModDelay\n",
        "\n",
        "Variable length delay lines create an instantaneous pitch shift that can be useful in a variety of time modulation effects such as [vibrato](https://en.wikipedia.org/wiki/Vibrato), [chorus](https://en.wikipedia.org/wiki/Chorus_effect), and [flanging](https://en.wikipedia.org/wiki/Flanging). "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "F_mihqZZx_s_"
      },
      "outputs": [],
      "source": [
        "#@markdown Record or Upload Audio\n",
        "\n",
        "record_or_upload = \"Record\" #@param [\"Record\", \"Upload (.mp3 or .wav)\"]\n",
        "\n",
        "record_seconds =   5#@param {type:\"number\", min:1, max:10, step:1}\n",
        "\n",
        "if record_or_upload == \"Record\":\n",
        "  audio = record(seconds=record_seconds)\n",
        "else:\n",
        "  # Load audio sample here (.mp3 or .wav3 file)\n",
        "  # Just use the first file.\n",
        "  filenames, audios = upload()\n",
        "  audio = audios[0]\n",
        "\n",
        "# Add batch dimension\n",
        "audio = audio[np.newaxis, :]\n",
        "\n",
        "# Listen.\n",
        "specplot(audio)\n",
        "play(audio)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "KSEpt_DVEGbZ"
      },
      "outputs": [],
      "source": [
        "def sin_phase(mod_rate):\n",
        "  \"\"\"Helper function.\"\"\"\n",
        "  n_samples = audio.size\n",
        "  n_seconds = n_samples / sample_rate\n",
        "  phase = tf.sin(tf.linspace(0.0, mod_rate * n_seconds * 2.0 * np.pi, n_samples))\n",
        "  return phase[tf.newaxis, :, tf.newaxis]\n",
        "\n",
        "def modulate_audio(audio, center_ms, depth_ms, mod_rate):\n",
        "  mod_delay = ddsp.effects.ModDelay(center_ms=center_ms,\n",
        "                                    depth_ms=depth_ms,\n",
        "                                    gain_scale_fn=None,\n",
        "                                    phase_scale_fn=None)\n",
        "\n",
        "  phase = sin_phase(mod_rate)  # Hz\n",
        "  gain = 1.0 * np.ones_like(audio)[..., np.newaxis]\n",
        "  audio_out = 0.5 * mod_delay(audio, gain, phase)\n",
        "\n",
        "  # Listen.\n",
        "  play(audio_out)\n",
        "  specplot(audio_out)\n",
        "\n",
        "# Three different effects.\n",
        "print('Flanger')\n",
        "modulate_audio(audio, center_ms=0.75, depth_ms=0.75, mod_rate=0.25)\n",
        "\n",
        "print('Chorus')\n",
        "modulate_audio(audio, center_ms=25.0, depth_ms=1.0, mod_rate=2.0)\n",
        "\n",
        "print('Vibrato')\n",
        "modulate_audio(audio, center_ms=25.0, depth_ms=12.5, mod_rate=5.0)\n"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "collapsed_sections": [
        "hMqWDc_m6rUC",
        "ZFIqwYGbZ-df",
        "6jXC-hm09dyl",
        "256dCv-T9xHi",
        "4HmOvGzs9zHB",
        "d-K69x2790ad",
        "C_lPZWTN92YJ",
        "Lay_6Ldw93ZL",
        "Lc_md-cD99y6",
        "mkjWUGpZ95Mr"
      ],
      "last_runtime": {},
      "name": "1_synths_and_effects.ipynb",
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
